EN FR
EN FR


Section: New Software and Platforms

FP-Hadoop

Fast Parallel Hadoop

Keywords: Hadoop - Data parallelism

Functional Description: FP-Hadoop makes the reduce side of Hadoop MapReduce more parallel and efficiently deals with the problem of data skew in the reduce side. In FP-Hadoop, there is a new phase, called intermediate reduce (IR), in which blocks of intermediate values, constructed dynamically, are processed by intermediate reduce workers in parallel.  Our experiments using FP-Hadoop using synthetic and real benchmarks have shown excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time.